Beyond Traditional Models

Elizabeth King
Kevin Middleton

Randomization can apply to anything with a null hypothesis

  • Questions not typical to standard statistical tests
  • Beyond a single response variable

Example: Spatial Distribution

Are species distributions clumped across the landscape?

Possible Null Hypotheses:

  • Randomly distributed
  • Uniformly distributed

Sampling in quadrats1

  • Are species clustered in quadrats?
    • i.e., do we observe more occurrence than expected randomly in few quadrats?
  • Seabirds in the Anadyr Strait

Species - quadrat curve

\[s^*(m) = \displaystyle\sum^{K}_{i=1} \left[ 1 - \frac{\binom{M - L_i}{m}}{\binom{M}{m}} \right] \]

where:

  • \(M\) is the number of quadrats
  • \(m\) is the individual quadrat number
  • \(L_i\) is the number of quadrats species i is found in
  • \(K\) is the total number of species

Randomize

  • Randomly assign observations to quatrats & recalculate the curve
  • Repeat many times and calculate a quantile curve

Randomize

niter <- 1000

birds.s <- birds
out.s <- matrix(data = NA, nrow = niter, ncol = quads)

for(ii in 1:niter){
  
  birds.s$quadrat <- sample(1:quads, size = nrow(birds), replace = TRUE)
  nsps <- birds.s |> 
    count(Species)
  pres <- birds.s |> 
    group_by(Species) |>
    summarise("pres" = length(unique(quadrat)))
  out.s[ii,] <- sm(quads, nsps$n, pres$pres)
  
}

Visualize the curves

Example: Comparing Fold Changes

  • Gene expression projects typically calculate \(log_2\) fold changes to compare between groups
  • Ratios like this are problematic for analysis
set.seed(12439)

rr <- tibble(pid = rep(letters[1:6],100),
             cc = rnorm(100*6,10),
             aa = rnorm(100*6,10),
             bb = rnorm(100*6,10))
rr <- rr |>
  mutate(aa_cc = aa/cc, bb_cc = bb/cc)

rr |> ggplot(aes(aa_cc, bb_cc)) +
  geom_point() + 
  geom_smooth(method = lm) +
  facet_wrap(. ~ pid, ncol = 3)

Example: Comparing Fold Changes

Gene expression in different diets

  • 6 samples in each of 3 diets
  • 3 tissues

What is the null expectation for fold changes?

  • 6 samples in each of 3 diets
    • Easy to get a random draw that mimics the observed set up
    • For each randomization, randomly choose 2 samples from each original diet to assign to each new diet (~800 possible combos)
    • 100 randomizations (limited by computation time)

Randomizations